PLOS Computational Biology
● Public Library of Science (PLoS)
Preprints posted in the last 30 days, ranked by how well they match PLOS Computational Biology's content profile, based on 1633 papers previously published here. The average preprint has a 1.32% match score for this journal, so anything above that is already an above-average fit.
Geisler, W. S.
Show abstract
Perceptual systems in humans and many other animals are able to segment scenes into regions that are likely to be physically meaningful. This ability depends on having low-level mechanisms that can accurately categorize whether local image patches are samples from the same or different kinds of texture. We find that using spatial proximity as a proxy for same-different ground truth makes it possible to train accurate decision variables and bounds directly from arbitrary natural images with no feedback. We also find that performance can be further improved by using proximity as a ground truth for adjusting the final decision variables and bounds for the current image/scene. These surprising findings result from the simple fact that under a wide range of conditions proximity discrimination (near vs. far) and texture discrimination (same vs. different) have mathematically identical decision bounds if the same image features are used for both tasks. We used the decision variables and bounds trained on natural images as the initial steps in a hierarchical Bayesian observer (HBO) model of texture discrimination [9]. Given the relative simplicity of this HBO model, it did an excellent job of segmenting images having randomly shaped regions containing arbitrary natural textures. We suggest that the proximity proxy is something that natural selection could discover and exploit for any same-different task where the task-relevant stimulus features also vary systematically with distance in space and/or time. For example, natural selection could have created developmental learning/plasticity mechanisms that exploit the proximity proxy.
Rennert, E.; Behera, A. K.; Qiu, Y.; Vaikuntanathan, S.
Show abstract
Generative diffusion models have demonstrated an ability to produce novel images sampled from the learned underlying data distribution. These models are able to infer system characteristics for parameter combinations that were not seen during training. We investigate the ability of these models to infer trends in biological data from limited samples. Specifically, we consider the response of system scale behaviors such as cortical flow in a simulated actomyosin system as we tune filament turnover rates. We train a diffusion model on coarse grained actin curvature and density heatmap images, and are able to generate images from conditioning variables not seen during training. These images are predictive of nonlinear trends in the system. We also consider characteristics of the system that allows this level of inference, such as the strong linear relationship between average density and filament turnover in the system, and by exploring minimal underlying dynamics with a motor binding model.
Riegner, G.; Schwartzman, A.; Reinagel, P.
Show abstract
Decision-making behavior changes over time, exhibiting temporal correlation and nonstationarity. Existing drift diffusion model (DDM) fitting methods either do not provide uncertainty quantification for parameter estimates, or rely on restrictive assumptions that decisions are independent and that parameters remain constant over time, potentially underestimating uncertainty. To address these limitations, we propose a computationally efficient method for estimating analytic uncertainties in DDM parameters that are robust to temporal dependence and unmodeled parameter variability, while explicitly modeling nonstationary variability through covariates. We apply this method to rat decision-making in a two-alternative forced-choice (2AFC) visual task, revealing dynamic decision-making states across multiple timescales. A Python implementation of the method is provided.
Saini, S.; Narayanan, R.
Show abstract
MotivationInformation flow and temporal coding in cortical circuits depend critically on the reliable transmission of precisely timed synchronous spike patterns. Although cortical assemblies achieve such transmission despite pronounced intrinsic heterogeneities and stochastic high-conductance states, the mechanisms underlying effective synchrony propagation under in vivo conditions remain poorly understood. MethodologyIn this study, we address this gap using large-scale, conductance-based models of excitatory and inhibitory neurons organized into feedforward synfire chains operating in noisy, high-conductance regimes. Using independent stochastic search algorithms, we first identified physiologically valid heterogeneous populations of cortical neurons. Both excitatory and inhibitory populations exhibited cellular-scale degeneracy, whereby distinct combinations of biophysically identified molecular components produced signature physiological characteristics. We then constructed synfire chains with varying degrees of heterogeneity using these populations and assessed the propagation of different spike packets across neuronal assemblies. ResultsWe found synchrony propagation to be inherently probabilistic, revealing a stochastic separatrix that separated input patterns that consistently succeeded from those that consistently failed in propagation. The stochastic nature of this separatrix highlighted a critical role for background synaptic fluctuations, defining a regime in which identical inputs alternately propagated or failed across trials solely due to stochastic background activity. Comparing networks with different degrees of intrinsic heterogeneity, we found that increasing heterogeneity did not alter mean propagation efficacy but reduced network-to-network variability, indicating a stabilizing role for intrinsic diversity. Strikingly, when we tested the impact of neuronal intrinsic properties on synchrony propagation, hyperpolarization-activated cyclic nucleotide-gated (HCN) channels emerged as robust enhancers of synchrony propagation across all heterogeneity regimes. Mechanistically, the slow restorative kinetics of HCN conductances narrowed the temporal window for spike initiation, sharpening output synchrony, and improving propagation reliability. This effect was abolished when HCN kinetics were accelerated, underscoring the importance of the slow negative feedback mediated by these channels. ImplicationsTogether, our analyses identify HCN channels as key regulators of synchronous information transfer and reveal strong interactions among intrinsic conductances, input characteristics, neuronal heterogeneity, and stochastic background activity in shaping cortical synchrony propagation. The ability of diverse cellular and network configurations to achieve similar propagation efficacy further highlights degeneracy as a fundamental principle governing robust and flexible neural computation.
Casajuana, B.; Casals-Franch, R.; Lopez Garcia de Lomana, A.; Marti-Puig, P.; Villa-Freixa, J.
Show abstract
Parameter estimation in nonlinear biological dynamical systems is a difficult inverse problem because the governing equations are often stiff or oscillatory, the data are sparse and noisy, and the objective landscape is non-convex. Physics-informed neural networks (PINNs) offer an alternative to purely simulation-based calibration by representing state trajectories with neural networks while penalizing violations of the governing equations. This paper studies the empirical reliability of PINNs for recovering the parameters of the repressilator, a synthetic genetic oscillator formed by three cyclically repressive genes. We use synthetic time-series generated from the standard ordinary differential equation model and train inverse PINNs to estimate the production parameter {beta} and the Hill coefficient n. The study varies observation noise, partial observation of repressors, sampling density, sensitivity to initial parameter guesses, and the difference between stable and oscillatory regimes. The results show that PINNs can reconstruct trajectories accurately when the model structure is correct and the three repressors are observed, but parameter recovery is more fragile than trajectory fitting. Noise, sparse sampling, unobserved variables, and unfavorable initial guesses increase the risk of biased estimates. The stable regime is easier to reconstruct, whereas the oscillatory regime provides richer information but also exposes optimization sensitivity. These findings support PINNs as a useful reverse-engineering tool for small gene-regulatory ODE models, while highlighting the need for repeated runs, uncertainty reporting, and experimental designs that improve identifiability.
Fumagalli, F.; Spigler, G.
Show abstract
Bacteriophage therapy offers a potential route to treat antibiotic-resistant Klebsiella pneumoniae infections, but its use is limited by the narrow specificity of phage-host interactions. In Klebsiella, adsorption is largely determined by receptor-binding proteins (RBPs) that recognize bacterial capsular polysaccharides, yet current machine learning approaches often represent whole phages rather than the individual proteins that mediate recognition. Here, we ask whether adsorption can be predicted at the level of single RBPs and whether the resulting models can identify the molecular features responsible for host specificity. Using experimentally validated Klebsiella phage-host interactions, we extended the PhageHostLearn framework from averaged phage-level representations to individual RBP-level predictions. We found that single-RBP models recover the predictive performance of strain-level models when host capsule identity is explicitly represented. However, models trained only on interaction-level labels did not reliably distinguish motif-bearing RBPs from other viral proteins, indicating that protein-level inputs alone are insufficient for mechanistic interpretability. To resolve this ambiguity, we identified serotype-specific conserved motifs among RBPs from phages infecting the same capsular type. Structural modelling showed that these motifs localize to exposed regions of RBPs and resemble carbohydrate-binding modules. Incorporating motif information into a relabelled training scheme improved prioritization of motif-bearing RBPs while preserving interaction-level predictive power. We further identified a candidate multi-motif RBP from phage S8c that may recognize multiple capsular serotypes. Together, these results support a modular model of Klebsiella phage adsorption in which conserved sub-protein elements drive capsule recognition. More broadly, this work shows how protein-level machine learning combined with biological constraints can move beyond accurate phage-host prediction toward mechanistic identification of host-range determinants. Author summaryBacteriophages -viruses that infect bacteria- are being explored as alternatives to antibiotics, especially against drug-resistant pathogens such as Klebsiella pneumoniae. The challenge is specificity: each phage attaches to only a narrow range of bacterial strains, recognising them through proteins on its tail that bind the bacteriums protective sugar capsule. Choosing or engineering the right phage for a given infection therefore requires understanding what these recognition proteins actually do. We asked whether a machine learning model could move beyond predicting which phages infect a given strain and start identifying which protein on the phage drives that recognition. Prediction alone, we found, is not enough: a model can be accurate without pointing to the responsible protein. To bridge this gap, we searched for short shared sequences among recognition proteins from phages that infect bacteria with the same capsule type, and used these shared patterns to guide the model. This combination correctly prioritised the recognition protein far more often than chance. One phage protein, from phage S8c, carried patterns matching five different capsule types, suggesting a candidate broadly-recognising protein for future experimental study.
Lee, P. C.; Snedden, C. E.; Morris, D. H.; Lloyd-Smith, J.
Show abstract
Dose-response modeling provides estimates of infectious and lethal doses, which can be used to inform control and prevention measures. Unfortunately, data from experimental challenge studies, which are needed to perform dose-response modeling, are often sparse. For example, non-human primate (NHP) challenge studies tend to have small samples sizes and little dose variation, often with only one or two dose levels per study. Thus, it is infeasible to apply traditional dose-response modeling approaches to data from single NHP studies. To address this challenge, we developed a mechanistic Bayesian model that aggregates and analyzes NHP pathogen load data across multiple studies. Our model links dose-infectivity to pathogen kinetics, which allows us to estimate the infectious dose and evaluate dose effects on within-host viral kinetics simultaneously. With this model, we obtained the first-ever ID50 estimate for SARS-CoV-1 in NHPs using data compiled from six NHP challenge studies. Our work demonstrates the value in reusing previous data from animal experiments. Our modeling framework can be applied to other pathogens, enabling robust dose-response inference when individual challenge studies are inconclusive.
Dal-Castel, P. C.; Resnick, J. D.; Sluka, J. P.; Gallagher, M. E.; Helfers, M.; Bird, I. M.; Ratcliff, J. D.; Grady, S. L.; Glazier, J. A.
Show abstract
In the respiratory epithelium, interferon (IFN)-induced antiviral resistance acts as a defense against infection. Influenza A viruses (IAVs) have evolved multiple strategies to counteract these defenses, including expression of the viral protein NS1, which inhibits both IFN production and the IFN-mediated transcription of Interferon Stimulated Genes (ISG) in infected cells. However, experiments show that this inhibition is imperfect, especially at a low multiplicity of infection (MOI). One hypothesis to describe this phenomenon relies on the presence of Semi-infectious Particles (SIPs) that fail to express NS1. In this scenario, the IFN response is incompletely suppressed at low MOI, while it is successfully inhibited at high MOI because most cells are infected by multiple virions, allowing complementation to rescue NS1 expression. To test this hypothesis, we developed a computer simulation that models viral gene defects and complementation. We compared the model outputs with in vitro experiments at different MOIs. To assess inter-host reproducibility and calibrate the model parameters, we measured IFN levels and viral load over time in bronchial epithelial cell cultures from five human donors. We observed no statistically significant heterogeneity in IFN response or virus production between donors, and the calibrated simulation fits the experimental time series for IFN and viral load. Consistent with literature (1,2), the model predicted higher IFN levels at low MOI than at high MOI. Finally, simulations of IFN treatment applied before and during infection showed reduced viral load, in agreement with our experiments. Increasing the viral genome defect rate above the experimentally estimated rate increased IFN levels and reduced viral load. High MOI simulations showed lower cumulative IFN levels, while NS1 knockout recovered high IFN levels. These results demonstrate the ability of mechanistic models of viral dynamics to predict the innate immune response of epithelial cells during viral infection. Author SummaryRespiratory viruses such as influenza A are highly infectious and pose significant challenges for the human immune system. Through laboratory experiments and computer simulations, we investigated how cells in the respiratory epithelium defend themselves and their neighbors against infection. Using cells collected from different donors, we generated 3-dimensional cell cultures that mimic human airways and measured how they respond to IAV. When a tissue was initially exposed to a small amount of virus, cells could successfully slow or stop the spread of the infection. This phenomenon is hypothesized to be due in part to the high error rate in IAV replication, resulting in many viral particles that are not fully functional. We recapitulated this experimental result with our computational model, validating the model design and parameter estimates. We then simulated a scenario in which cells were pre-treated with interferon, a protective cytokine important to early immune response, and showed that this pre-treatment could successfully limit infection. Laboratory experiments subsequently confirmed this predicted behavior. The computational model reproduced key observations across infection conditions and identified nonfunctional viral particles as important drivers of the early immune response.
Dupeuble, F.; Berry, H.; Denizot, A.
Show abstract
A growing number of studies indicate the possible involvement of astrocytes in triggering or modulating neurovascular coupling (NVC), i.e. the local dilation of blood vessels in the brain in response to neuronal activity. Astrocytes possess specialized subcellular compartments, named endfeet, that surround arterioles and capillaries, ideally positioned to mediate NVC. Various vasodilators have been shown to contribute to NVC, such as epoxyeicosatrienoic acid (EET), nitric oxide (NO), or prostaglandin E2 (PGE2), but the precise mechanisms underlying NVC and their variability remain to be fully elucidated. In particular, the involvement of astrocytes in this process is controversial. Recent translatome and proteomics data reveal that astrocytes and in particular endfeet are enriched in the proteins of the PGE2 pathway. However, how the latter could contribute to NVC remains to be characterized. Here, we develop a computational model of astrocyte-mediated NVC that recapitulates these findings and describes Ca2+ and PGE2 signaling in astrocytes, NO release by neurons, and arteriole diameter dynamics using ordinary differential equations. The model successfully reproduces the dynamics of arteriole diameter change during hyperemia from in vivo neocortical recordings in awake mice. Our simulations suggest that the astrocyte PGE2 pathway could be responsible for the late response of NVC at the arteriolar level. We further observe that PIP2-derived diacylglycerol plays a major role in driving arteriole diameter dynamics in our model, while phosphatidic acid-derived diacylglycerol, which is calcium-dependent, mainly acts as an amplifier of this response. Finally, a spatial implementation of the model using a simplified astrocyte geometry suggests that NVC is more efficient when synaptic stimulation occurs at the endfoot level rather than at other astrocytic compartments. Overall, this computational study suggests a partial role for astrocyte-mediated PGE2 release in NVC and points to astrocyte perivascular processes as sub-compartments that are ideally positioned and equipped to mediate NVC. Author summaryIn the brain, the local blood flow is regulated to meet neuronal energy demand by modulating the dilation of neighboring blood vessels. The mechanisms driving this process, known as neurovascular coupling (NVC), remain debated and are likely to differ depending on the physiological context. Recent evidence points to astrocytes, a cell type possessing specialized protrusions called "endfeet", that envelop the entire brain vascular tree. Contacts between synapses and endfeet have recently been reported, positioning the latter as ideal mediators of NVC. Here, we developed a computational model that simulates the signaling between neurons, astrocytes, and blood vessels. Our model successfully reproduces experimental recordings of blood vessels dilation in the brains of awake mice. Our simulations suggest that a specific signaling pathway in astrocytes, involving a molecule called prostaglandin E2, is a key driver of the late phase of NVC, occurring a few seconds after neuronal activity. Furthermore, our model indicates that the location of the stimulated synapses matters: signals sent to the astrocyte endfeet are particularly effective at controlling blood flow. This work helps clarify the active role of astrocytes in brain blood flow regulation, a process critical for healthy brain function.
Dima, S. S.; Reeves, G. T.
Show abstract
During Drosophila embryogenesis, Bicoid (Bcd) forms a gradient that provides positional information to regulate target genes along the anteroposterior axis. To understand how the information provided by the Bcd gradient drives gene expression in a concentration-dependent manner, the subpopulations of Bcd participating in gene regulation need to be characterized. Therefore, to understand the mechanism of Bcd-mediated gene regulation, we quantified the absolute concentration of the nuclear subpopulations of Bcd, such as freely diffusing, DNA bound, and clustered Bcd. Each of these populations have distinct diffusivities and DNA-binding properties and are proposed to be crucial for gene regulation. This quantification allowed us to construct a global dose/response relationship between the free and DNA-bound concentration of Bcd. Our data show that Bcd/DNA binding is strongly correlated with the free concentration of Bcd, indicated by the dose/response relationship being in the linear regime, despite the barrier presented by nucleosomes. Our data are quantitatively consistent with a Monod-Wyman-Changeux model in which transcription factors passively compete with nucleosomes for DNA binding. We further apply this model to Bcd/DNA interactions at the enhancer/promoter for hunchback (hb), a Bcd target gene which has a steep posterior boundary, despite being driven primarily by the graded Bcd concentration, a conundrum which has been under scrutiny for decades We show that, using parameters determined form the global dose/response relationship, a reversible multistate promoter model, in which the promoter activation rate is determined by Bcd binding to the hb enhancer, can successfully recapitulate features of hb transcriptional dynamics, including the sharp posterior boundary. Therefore, this work sheds light on mechanism of hb regulation by Bcd and provides a potential experimental/computational pipeline that bridges the input of global properties of transcription factors to the transcriptional output of specific target genes.
Xu, Y.; Pai, N.; Wayment-Steele, H. K.
Show abstract
Genomic language models (gLMs) trained only on large-scale nucleic acid sequence data seem to capture signals of RNA structure, yet the specifics of how remain unclear. Using the categorical Jacobian (CJ) operation, a model-agnostic operation for querying pairwise dependencies, we systematically compared three flagship gLMs: RNA-FM, Evo 2, and gLM2. We found that CJ signals recover base pairs supported by evolutionary covariation analyses, consistent with findings in protein language models. Surprisingly, CJ also recovers base pairs lacking evolutionary support but predicted by biophysical nearest-neighbor models. Is it possible gLMs have "learned" RNA thermodynamics? We noticed nearest-neighbor RNA folding models often predict reflected structures when given reversed sequences, consistent with these models modular and grammar-like nature. We leveraged this observation to create a simple "mirror test" that we found gLMs routinely fail, indicating they have not learned generalizable biophysics-based rules for RNA structure. Nevertheless, their apparent thermodynamic signal potentially confounds interpreting gLM pairwise dependencies as evidence of evolutionary conservation. We therefore introduce a method using synthetic sequences as a control for detecting significant learned signal. Our results demonstrate that gLMs can mimic thermodynamics through learned sequence context rather than general physical principles, but solutions exist for disentangling patterns in language models.
Carannante, I.; Dlima, N.; Destexhe, A.; Jirsa, V.; Bedoui, M. H.; Depannemaecker, D.
Show abstract
Epileptic seizures emerge from pathological synchronization in neuronal networks and are strongly influenced by circadian rhythms. Here, we developed a computational framework to investigate how circadian modulation of excitation/inhibition (E/I) balance shapes transitions from physiological to pathological activity. The model consists of interacting excitatory and inhibitory populations containing varying proportions of impaired neurons with altered intrinsic excitability. Circadian effects were incorporated through modulation of synaptic time constants, mimicking daily fluctuations in E/I dynamics. Network activity was characterized using firing rates and the Spike Time Tiling Coefficient (STTC), enabling simultaneous assessment of excitability and synchrony. Our results show that seizure-like dynamics arise from nonlinear interactions between network composition, neuronal impairment, and synaptic kinetics. Distinct dynamic regimes emerged, separated by sharp transitions in synchrony and activity patterns. These findings provide a mechanistic link between circadian regulation and seizure susceptibility, supporting the development of chronotherapy approaches for epilepsy.
Alrefae, T. A.; Pons-Salort, M.; Donnelly, C. A.; Lambert, B.; Kamau, E.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWSerological assays remain the standard experimental approach for estimating the cumulative incidence of a pathogen and monitoring population immunity. The predominant approach for analysing serum titration data from virus neutralisation assays uses a nearly century-old interpolation-based method which neglects inherent imperfections in the assay and produces estimates with no measure of uncertainty. We introduce a two-part Bayesian modelling framework to estimate the underlying antibody concentrations in the raw serum samples taken from serosurveyed individuals, to improve the interpretation of serological data over age. First, we develop a mechanistic Bayesian model for serum antibody titration data that estimates latent antibody concentrations while accounting for assay variability and quantifying uncertainty. Second, we propagate this uncertainty into an age-structured serocatalytic model by integrating over posterior draws of individual antibody concentrations, allowing joint inference on latent serostate membership, force of infection, and serological waning rate. We use this framework to explore the dynamics of infection and immunity for three enterovirus serotypes: enteroviruses A71 (EV-A71) and D68 (EV-D68) and coxsackievirus A6 (CVA6). These serotypes are leading causes of outbreaks of severe respiratory illness and hand, foot, and mouth disease. Applying these approaches to three cross-sectional serosurveys, we estimated consistently higher and more persistent antibody concentrations throughout life for EV-D68 compared to EV-A71 and CVA6. Our analysis suggests that the proportion of recently infected individuals (i.e. individuals with high estimated antibody concentration levels given their age) peaks around 25% by age 7 years for both EV-A71 and CVA6 before gradually declining with age. In contrast, for EV-D68 the inferred proportion of the population in the infected state exceeds 50% by age 9 years and continues to grow with age. We also estimate that EV-D68 antibody concentration levels are higher than those of the other two serotypes, with the force of infection estimated to be highest in early childhood and declining more gradually with age than for EV-A71 and CVA6. These estimates are different to previous estimates found in the literature. Our inferential framework uncovers the wide-ranging variation in antibody levels that are often obscured by conventional endpoint titre estimation methods. We demonstrate that our framework can infer infection rates without relying on predetermined seropositivity cut-offs and without making explicit assumptions of virus-specific infection mechanisms. Author summarySerological tests measure antibody levels in blood to show how widely a virus has spread and how well populations are protected. Titre-based tests dilute blood samples in steps, mix these dilutions with virus, and add the mixture to living cells; the titre is the highest dilution where antibodies still protect cells from infection. Traditional analyses overlook test imperfections. We present a new two-part Bayesian framework to estimate antibody levels and track age-related exposure to infection. First, we estimate underlying antibody concentrations while accounting for uncertainty, then use these estimates in another model to infer age-specific transmission of three common viruses - EV-A71, EV-D68, and CVA6. Our results show that EV-D68 infections may be more common, especially in children, compared to the other viruses. This new approach provides a clearer picture of the dynamics of seroconversion, without relying on arbitrary thresholds, helping to improve public health monitoring and responses.
Murphy, M. R.; Nielsen, R.; Perkins, A.; Greenhouse, B.
Show abstract
MotivationMolecular surveillance and infectious disease transmission network reconstruction can provide compelling evidence for estimating public-health quantities that are difficult to observe directly, including importation, source-sink structure, and differences in onward transmission across locations or intervention strata. These quantities can be expressed as functions of the underlying transmission network, but individual transmission events are rarely observed and many networks may be consistent with the same data. Existing transmission network reconstruction methods leveraging genetic data are often built for settings in which each infection has one dominant source, one representative haplotype, and mutation-driven genetic divergence along transmission chains. These assumptions are poorly matched to polyclonal infections, in which hosts carry multiple genetically distinct clones and recipient infections may reflect contributions from multiple sources. Such infections are common in malaria, tuberculosis, HIV, and many parasitic infections. Methods are needed that can accommodate these data. ResultsWe present a modular Bayesian framework for estimating directed transmission on sampled cases, where an infection may have no sampled parent, one parent, or several parents, including sources outside the observed panel. Pathogen-specific modules supply likelihoods over candidate parent sets and connect to shared inference that yields marginal directed edge probabilities, posterior mean out-degree, and inclusion probabilities for unobserved parents. We demonstrate our framework with Plasmotrack, a transmission network model for Plasmodium falciparum that uses targeted amplicon sequencing data. We implemented these components with a per-locus allele-mixture transmission likelihood, an amplicon genotyping error model, and data augmentation allowing for unobserved parents. Simulations from a biologically informed generative model, under which the inferential per-locus allele-mixture likelihood is misspecified, showed recovery of aggregate network summaries including mean outdegree and mean unobserved-source inclusion, alongside high precision and recall for detecting directed transmission. Other pathogens can reuse the same modular composition after substituting transmission and observation likelihoods. AvailabilityThe Plasmotrack software and documentation are available at https://github.com/eppicenter/plasmotrack. Source code and example datasets are provided under an open-source license. Contactmaxwell.murphy@ucsf.edu
Meduri, R.; Satish, A. L.; Singh, U.
Show abstract
Selective deployment of multiple transcription start sites is a major regulatory feature of human transcriptomes. FANTOM CAGE data exhibit a near-universal TSS deployment parsimony which is disrupted in cancers. We have recently shown that TSS deployment is sensitive to gene function, futile upstream transcription, and cellular biosynthetic states. Patterns in FANTOM CAGE data can reveal mechanisms underlying TSS co-deployments. We propose and test the possibility that some TSSs act like epromoters and act as co-varying hubs of transcriptional activities for multiple other promoters. Using deep analysis of CAGE data implemented through neural networks we show that non-cancers implement transcription co-deployments through cores of epromoter-like TSSs which are generally proximal to their start codons. These TSSs show enhancer-like TFBSs profiles. A comparison with cancer CAGE data shows that the concentrated epromoter core is disrupted in cancers with multiple distal TSSs replacing the proximal TSS cores. We provide evidence that the core TSSs are rich in YY1 and CTCF binding sites and associated with genes coding for transcription factors. Our findings show that covariance of TSS deployment is sensitive to transcriptional resource cost and a parsimonic design of TSS co-deployments depends on proximal TSSs in non-cancers, a mechanism grossly disrupted in cancers. HighlightsO_LIHeterogeneous FANTOM CAGE data contains universal patterns of TSSs co-deployments. C_LIO_LITSS co-deployments exhibit a parsimonious "core-covariant" scheme which is disrupted in cancers. C_LIO_LICore TSSs are enriched in transcription factor binding sites and gene functions which justify biological features of the samples. C_LIO_LIThe DL pipeline we present identifies the core-covariant TSS sets in an unbiased manner. C_LI
Dhananjanie, A.; Thompson, H.; Vercelloni, J.; Warne, D. J.
Show abstract
Explainable machine learning (ML) methods are gaining increasing attention in environmental and ecological research for their ability to reveal relationships between environmental drivers and population dynamics. However, there remain questions on the reliability of these tools, especially given recent research shows that these explanations can be highly sensitive to model architecture. In ecology, it is typical to use a single ML model, and a comparative evaluation of sensitivity of explainability for different ML approaches is overlooked. In this paper, we develop a novel framework that quantifies explanation consistency between multiple ML model architectures. This framework provides a discrepancy measure for each model prediction, with high discrepancy indicating substantive explanation disagreement across models and low discrepancy indicating strong consensus in explanations across models. We then demonstrate that low explanation discrepancy aligns well with ground truth mechanism. Furthermore, high explanation discrepancy provide a mechanism to identify areas for model refinement and further investigation by domain experts. We do this by using a simulation study based on synthetic coral cover data that incorporate spatio-temporal variability driven by known disturbance effects. Our method provides a quantitative approach to assess the sensitivity of explainable ML in the absence of ground truth. As a result, this enhances the utility of ML approaches in conservation and ecological management. While we focus primarily on ecological modelling for coral reefs, our methods are generally applicable to other ecological and environmental modelling settings.
Oros, D.; Krug, J.
Show abstract
With the increasing availability of large scale empirical fitness landscape data, there is a need for simple yet informative null models that can be used to interpret metrics of landscape ruggedness and navigability. A natural choice of a null model that maximizes ruggedness in a statistical sense assigns independent and identically distributed fitness values to the genotypes, a setting often referred to as the House-of-Cards (HoC) or mutational landscape model. In this work we examine the navigability of these landscapes, as quantified by the mean size of the adaptive basins of local fitness peaks. The adaptive basin is the set of genotypes from which a peak can be reached via selectively accessible, i.e., strictly fitnessincreasing mutational paths. Building on recent rigorous results on the statistics of accessible paths, we show that the adaptive basins in the HoC landscape encompass a positive fraction of all genotypes that is an analytically computable, increasing function of the number of alleles per site. For the four letter nucleotide alphabet, an average peak basin contains 52.8 % of all genotypes. When conditioned on peak fitness, the expected basin size increases linearly with fitness rank. The exact results on adaptive basins are complemented by an approximate analysis of gradient basins formed by greedy adaptive paths which maximize the fitness increase in each step. We argue that recent reports of large adaptive basins in empirical fitness landscapes should be reinterpreted in the light of our findings.
Inoue, K.-i.; Ishii, Y.; Hariyama, M.
Show abstract
Interdependent multicellular circuits must maintain stable coexistence despite competition for shared environmental resources. Fibroblast-macrophage circuits represent a conserved signaling architecture in which fibroblasts produce colony-stimulating factor 1 (CSF) to support macrophages, whereas macrophages produce platelet-derived growth factor (PDGF) to support fibroblasts. Previous analytical models proposed receptor-mediated endocytosis as a stabilizing negative-feedback mechanism, but these formulations assumed spatial homogeneity and independently assigned carrying capacities. Here, we constructed a spatial agent-based fibroblast-macrophage circuit model using PhysiCell to investigate how PDGF and CSF endocytosis regulate circuit stability under explicit competition for shared oxygen and space. Fibroblasts and macrophages competed for common environmental resources supplied by spatially distributed capillary sources, allowing carrying capacity to emerge dynamically from local resource competition. Across nine enhancer conditions spanning fourfold variation in PDGF and CSF signaling strength, heterotypic coexistence remained broadly achievable regardless of endocytic activity. In contrast, endocytosis strongly suppressed stochastic circuit failure. This stabilization depended critically on macrophage CSF uptake, whereas broad ranges of fibroblast PDGF uptake produced comparable outcomes, generating a sloppy stabilization landscape along the PDGF uptake axis. Mechanistically, excessive CSF signaling drove macrophage overexpansion, depletion of shared resources, and eventual fibroblast extinction. Importantly, despite fundamentally different carrying-capacity assumptions from previous analytical models, both frameworks converged on the same systems-level conclusion: stabilization of the macrophage-supporting CSF axis is substantially more critical than stabilization of the PDGF axis. These results identify endocytosis as a robustness mechanism that suppresses catastrophic failure in interdependent multicellular circuits under shared-resource competition without requiring precise parameter tuning.
Ren, N.; Mankili, A.; Stevenson, I. H.
Show abstract
Although long-term synaptic plasticity is often studied using controlled electrical or optogenetic stimulation, it also occurs spontaneously during natural, ongoing brain activity. Here, using large-scale spike recordings in mice from the Allen Institute Visual Coding Neuropixels dataset, we examine to what extent fluctuations in putative synaptic efficacy can be tracked and explained by models of activity-dependent long-term plasticity. We first detect putative excitatory synaptic connections within the hippocampus based on cross-correlations between the spike trains of thousands of pairs of neurons. Most of these putative connections are between presynaptic neurons with broad spike waveforms and postsynaptic neurons with narrow spike waveforms and are consistent with synapses from excitatory to inhibitory units. For the subset of pairs where a transient, excitatory effect was detected, we use a model-based approach to track fluctuations in synaptic efficacy. Previous work found that these fluctuations can be partially predicted from pre- and postsynaptic firing rates and models of short-term plasticity. Here we model naturally occurring long-term potentiation and depression using rate-based learning rules. We find that modeling the covariance of pre- and postsynaptic activity improves prediction of efficacy fluctuations. We also examine synaptic changes associated with hippocampal sharp-wave ripples, but do not find clear evidence of systematic SWR-associated changes for the putative synapses studied here.
Mori, K.; Yamada, M.
Show abstract
The willingness to exert cognitive effort is essential but is constrained by the subjective cost of effort. Although effortful tasks are often avoided, positive bias about ones own performance may help sustain engagement with cognitive demands. Here, participants completed an effort-based decision-making task and reported trial-by-trial predictions of their own performance, allowing us to quantify performance prediction error (PPE) as the discrepancy between subjective and objective accuracy. The results showed that PPE was predominantly positive and increased with effort level, indicating greater overestimation under higher cognitive demands. Using a computational model, we show that choices were best explained by a learning model in which rewarded trials accompanied by positive PPE decreased subsequent sensitivity to effort. A confidence-based control model did not provide a better account of choices, suggesting that this effect was better captured by positive performance bias than by confidence alone. Our findings provide a computational account of how biased self-evaluation may attenuate the subjective cost of cognitive effort and extend the positive bias literature to the task need for cognitive effort.